SUMMA: scalable universal matrix multiplication algorithm

نویسندگان

  • Robert A. van de Geijn
  • Jerrell Watts
چکیده

In this paper, we give a straight forward, highly e cient, scalable implementation of common matrix multiplication operations. The algorithms are much simpler than previously published methods, yield better performance, and require less work space. MPI implementations are given, as are performance results on the Intel Paragon system.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Parallel Matrix Multiplication : 2 D and 3 D FLAME Working Note # 62 Martin Schatz

We describe an extension of the Scalable Universal Matrix Multiplication Algorithms (SUMMA) from 2D to 3D process grids; the underlying idea is to lower the communication volume through storing redundant copies of one or more matrices. While SUMMA was originally introduced for block-wise matrix distributions, so that most of its communication was within broadcasts, this paper focuses on element...

متن کامل

A Fast Scalable Universal Matrix Multiplication Algorithm on Distributed-Memory Concurrent Computers

We present a fast and scalable matrix multiplication algorithm on distributed memory concurrent computers, whose performance is independent of data distribution on processors, and call it DIMMA1 (Distribution-Independent Matrix Multiplication Algorithm). The algorithm is based on two new ideas; it uses a modified pipelined communication scheme to overlap computation and communication effectivel...

متن کامل

Task-Based Algorithm for Matrix Multiplication: A Step Towards Block-Sparse Tensor Computing

Distributed-memory matrix multiplication (MM) is a key element of algorithms in many domains (machine learning, quantum physics). Conventional algorithms for dense MM rely on regular/uniform data decomposition to ensure load balance. These traits conflict with the irregular structure (block-sparse or rank-sparse within blocks) that is increasingly relevant for fast methods in quantum physics. T...

متن کامل

Algorithmic-Based Fault Tolerance for Matrix Multiplication on Amazon EC2

Cloud computing presents a unique alternative to traditional computing approaches for many users and applications. The goals of this project were to assess the viability of the cloud for scientific computing applications, and to explore fault tolerance as a mechanism for maintaining high performance in this variable and unpredictable environment. Most previous attempts to run scientific computa...

متن کامل

A new parallel matrix multiplication algorithm on distributed-memory concurrent computers

We present a new fast and scalable matrix multiplication algorithm, called DIMMA (Distribution-Independent Matrix Multiplication Algorithm), for block cyclic data distribution on distributed-memory concurrent computers. The algorithm is based on two new ideas; it uses a modi ed pipelined communication scheme to overlap computation and communication e ectively, and exploits the LCM block concept...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Concurrency - Practice and Experience

دوره 9  شماره 

صفحات  -

تاریخ انتشار 1997